Establishing resilience-targeted prediction models of rainfall for transportation infrastructures for three demonstration regions in China

Rainstorm is one of the global meteorological disasters that threaten the safety of transportation infrastructure and the connectivity of transportation system. Aiming to support the resilience assessment of transportation infrastructure in three representative regions: Sichuan–Chongqing, Yangtze River Delta, and Beijing-Tianjin-Hebei-Shandong, rainfall data over 40 years in the three regions are collected, and the temporal distribution of rainfall are analyzed. Prediction equations of rainfall are established. For the purpose of this, the probabilistic density function (PDF) is assigned to the rainfall by fitting the frequency distribution histogram. Using the assigned PDF, the rainfall data are transformed into standard normal space where regression of prediction equations is performed and the prediction accuracy is tested. The results show that: (1) The frequency of rainfall in the three regions follows a lognormal distribution based on which the prediction equations of rainfall can be established in standard normal space. The error of regression shows no remarkable dependence on self-variables, and the significance analysis indicates that the equations proposed in this paper are plausible for predicting rainfalls for the three regions. (2) The Yangtze River Delta region has a higher risk of rainstorm disaster compared to the other two regions according to the frequency of rainfall and the return period of precipitation concentration. (3) Over the period of 1980–2021, the Sichuan–Chongqing region witnessed an increase in yearly rainfall but a decrease in rainstorm disasters, whereas the other two regions experienced a consistent rise in both metrics.

with a multi-year average of 1.3 rainstorm days across the entire area.Since 1961, there has been a decreasing trend in the precipitation, days and intensity of rainstorms in most regions of Beijing-Tianjin-Hebei.After the 2000s, the amount of rainstorms further decreased [21][22][23] .In the middle and lower reaches of the Yangtze River, rainstorm events primarily span 2-3 days, with a maximum duration of 8 days.These events predominantly manifest during the summer months, particularly in June and July.Rainstorm events in this area exhibit both long-term trends and inter-decadal variations, with a noteworthy increase in occurrence frequency over the past 58 years.Furthermore, heavy and extremely heavy rainstorms have displayed a significant uptick in all seasons over the last decade [24][25][26] .
The ultimate goal of disaster research is to provide support and reference for pre-disaster prediction and post-disaster rapid response to regional transportation infrastructure risks.Due to the complexity of the disaster formation mechanism and the limited number of disasters in rainstorms, the methods and accuracy of pre-disaster prediction have been a focal point of attention.Wang Xiuying et al. 27 established a short-term heavy precipitation forecasting model for Pu' er City based on a multiple linear regression model.Their findings demonstrated the model's robust predictive capabilities, rendering it suitable for short-term heavy precipitation forecasting and alerts in Pu'er.Milan Gocic et al. 28 employed linear regression to forecast the precipitation trend using monthly precipitation data from 29 stations spanning the period from 1946 to 2012.Their findings indicated a consistent upward trend in annual precipitation levels in Serbia throughout this timeframe.He Xinguang et al. 29 developed a rainfall forecasting model using monthly historical rainfall data and climate indices by incorporating the multi-resolution analysis (MRA) and multiple linear regression (MLR) model.Their findings revealed that the proposed MRA-based model provides considerably more accurate monthly rainfall forecasts for all of the selected stations over South Australia than the traditional regression model.S.K.Chandniha et al 30 advocated the statistical downscaling model (SDSM) that is based on the multiple linear regression (MLR) technique to assess the likely future monthly rainfall in Piperiya watershed of Chhattisgarh state in India.The results showed it will help in studying the effect of climate change on the expected rainfall in this particular area.In recent years, a variety of prediction models, including Markov models, grey system models, spectral analysis models, and others, have been gradually used for rainstorm disaster prediction [31][32][33] .Sun Caizhi et al. 34 used precipitation data from a hydrological station in Shanxi Province for the past 50 years as an illustrative case.They utilized a weighted Markov chain model to forecast the variability of abundance and dried-up of precipitation in the future and achieved favorable outcomes.Feng Lihua 35 used the previous period's forecast factors for calculation and endowed grey clustering with the function of analysis and prediction, applying it to predicting and analyzing the precipitation trends.Li Ping et al. 36 used the precipitation data of Caizuizi Station in the Naolihe River Basin from 1964 to 2003 as an example and used spectral analysis for forecasting and analysis.The results indicated there are two main cyclicalities of annual precipitation in the area (about three years and nine years), reflecting the climate change patterns in the area.Furthermore, with the continuous development of computer technology, machine learning algorithms have been gradually applied to precipitation forecasting, especially the neural network model, which has been widely used 37,38 .Gou Zhijing et al. 39 established a genetic neural network prediction model based on the daily surface climate data from 13 stations in Tianjin from 1951 to 2006, and then conducted an experiment using rainfall level as the decision-making attribute.Their findings indicated that the prediction accuracy of this method for all precipitation levels is better than the traditional neural network algorithm.Liu Yang et al. 40 used a multi-hidden layer neural network to establish the nonlinear relationship between rainfall and various parameters and to forecast rainfall in the short term.The experimental results showed that the prediction model based on a multi-hidden layer neural network can predict more than 95% of the rainfall events, and the misreporting rate is only about 20%.
In contrast to accurate mathematical analysis based on a large-scale regional data volume, the demand for disaster emergency response in regional transportation infrastructure networks is more reflected in convenience for predicting the scope and intensity of disaster impact.Based on the need for resilience assessment of transportation infrastructure networks in the Beijing-Tianjin-Hebei-Shandong, Yangtze River Delta, and the Sichuan-Chongqing regions, this paper collects and statistically analyzes about 40 years of precipitation and climate change data.It aims to study the temporal distribution characteristics of rainfall in these three regions, along with the characteristic parameters of rainstorm disasters.Subsequently, a rainstorm disaster prediction model is developed based on these research findings.

Data
Daily and annual rainfall data spanning 42 years have been collected for the three representative demonstration areas: Sichuan-Chongqing, Beijing-Tianjin-Hebei-Shandong, and the Yangtze River Delta, which are used for analyzing the monthly and annual distribution of rainfall separately, as presented in Table 1.Specifically, data encompassing daily and annual rainfall from 30 meteorological stations in the Sichuan-Chongqing region, 34 meteorological stations in the Beijing-Tianjin-Hebei-Shandong region and 30 meteorological stations in the In this study, rainfall classifications adhere to the criteria outlined in "the standard of rainfall level," wherein a rainstorm is characterized by 24-h rainfall equal to or exceeding 50 mm and less than 100 mm, a heavy rainstorm by 24-h rainfall equal to or exceeding 100 mm and less than 250 mm, and an extremely heavy rainstorm by 24-h rainfall equal to or exceeding 250 mm 41 .Table 2 displays the chosen rainstorm characteristic variables, facilitating the analysis of temporal rainfall variation patterns across the three region.
Where R 24 represents the rainfall within 24 h. Figure 2 illustrates the monthly distribution of the frequency of rainstorms, heavy rainstorms, extremely heavy rainstorms and above in the three demonstration areas.As seen in Fig. 2, the peak monthly occurrence of rainstorms, heavy rainstorms, and extremely heavy rainstorm disasters in the three demonstration areas occurs in July.Rainstorm disasters are predominantly concentrated in the Sichuan-Chongqing region from May to September, with heavy rainstorm disasters being particularly prominent from June to September.In the Beijing-Tianjin-Hebei-Shandong region, rainstorms and heavy rainstorm disasters primarily occur from June to September.In contrast, the Yangtze River Delta region experiences a more extended concentration period of rainstorms and heavy rainstorm disasters, typically spanning from May to October.The monthly distribution of extremely heavy rainstorm disasters in the three demonstration areas no longer has regularity, only occurring from June to October.Moreover, the monthly occurrence of rainstorms, heavy rainstorms, and extremely heavy rainstorm disasters is typically higher in the Yangtze River Delta region compared to the other two regions.During July and August, the Beijing-Tianjin-Hebei-Shandong region experiences a higher incidence of rainstorms and heavy rainstorm disasters compared to the Sichuan-Chongqing region.
Figure 3 illustrates the temporal distribution of annual rainfall in the three demonstration areas, revealing significant interannual fluctuations and variations.A comparison of the three demonstration areas shows that the yearly rainfall in the Yangtze River Delta region is much higher than the other two areas from 1980 to 2021, and the yearly rainfall in the Sichuan -Chongqing region is more prominent than that in the Beijing-Tianjin-Hebei-Shandong region.In addition, the trend lines of the three demonstration areas are all a gradually increasing straight line, with slopes of 111.67, 195.95, and 273.78 in the Sichuan-Chongqing, Beijing-Tianjin-Hebei-Shandong, and Yangtze River Delta regions, respectively.This indicates that the annual rainfall increases at the fastest rate in the Yangtze River Delta region, followed by the Beijing-Tianjin-Hebei-Shandong region, and Table 2. Characteristic variables of rainfall.

Methodology
The process of forming the rainfall prediction models is:  www.nature.com/scientificreports/ Step 1. Assign a specific probability distribution type to each dataset based on the frequency distribution histogram of the original rainfall data in the three demonstration areas.Moreover, we need to perform a goodness-of-fit test on the specified probability density function by means of the chi2gof test.The equation is as follows: where O i are the observed counts and E i are the expected counts based on the hypothesized distribution.
Step 2. Transform the original rainfall data into standard normal space using the assigned PDF to meet the assumption of normal distribution of multiple linear regression.Once the suitable probability distribution is determined for each demonstration area, the sample values of the original data can be standardized using Eq. ( 2).
where θ i denotes data from three distinct regions: i = 1 for the Sichuan-Chongqing region, i = 2 for the Bei- jing-Tianjin-Hebei-Shandong region, and i = 3 for the Yangtze River Delta region.F θ i (θ i ) represents the edge cumulative probability distribution density function fitted by the data of θ i , while −1 [] stands for the inverse function of the standard normal cumulative distribution function.The variable v i comprises a series of standard normal random variables.Equation ( 2) facilitates the transformation of θ i data into the v i format, and utilizes v i to perform multiple linear regression in standard normal space, which ultimately lays a good foundation for establishing the prediction equations for the converted data 42 .
Step 3. Select three input variables for models in the three demonstration areas using correlation analysis.The prediction models established in this paper will be used as the basis for predicting and evaluating rainfall in the three demonstration zones.In order to improve the practicability of the model, specific criteria have been established for selecting variables: the input variables should not be more than three; the data for these variables should be readily accessible and operationally feasible to collect; and the chosen variables should be representative of the general pattern of rainfall.
In statistics, the correlation between variables can be measured by a statistical value, the correlation coefficient.The correlation coefficient generally takes values in the range of [-1,1].If there is a linear relationship between two variables, the correlation coefficient is positive; if there is a negative linear relationship between the two variables, the correlation coefficient is a negative number.Additionally, the closer the absolute value of the correlation coefficient is to 1, the stronger the correlation between the two variables.In contrast, the closer it is to 0, the weaker the correlation.
Step 4. Establish the prediction models and assess the performance of established models using various inspection methods based on the SPSS software.In standard normal space, we use the theory of multiple linear regression to construct rainfall prediction models for three demonstration areas based on the SPSS software.Three key assumptions are made in this approach: Assumption 1: The selected samples are independent of each other; Assumption 2: The samples exhibit no covariance; Assumption 3: The residuals follow a normal distribution 43 .
Under three assumptions, a prediction model containing three independent variables is established using multiple linear regression, which can be represented as follows: In the equation, a 0 , a 1 , a 2 , . . ., a n represent regression coefficients and ε denotes random error.

Probability distribution assignment
Prior to regression analysis, it is necessary to consider data transformation if the data do not adhere to a normal distribution.Using rainfall data from the three demonstration areas, we have plotted frequency histograms of the raw data, as depicted in Fig. 5.The figure reveals that the distribution characteristics of the original rainfall data in the three demonstration areas exhibit similarity, displaying highly uneven distribution with a predominant concentration below 0.5 inches.At the same time, there are differences in the distribution characteristics of raw rainfall data in the three demonstration areas.The low-frequency component of rainfall is the highest in the Sichuan-Chongqing region, followed by the Beijing-Tianjin-Hebei-Shandong region, and the lowest in the Yangtze River Delta region, indicating that rainfall in the Sichuan-Chongqing region is heavily dominated by drizzle.However, the relationship between the maximum rainfall in the three regions is exactly the opposite.Data conforming to a normal distribution is a prerequisite for regression analysis, and assigning a probability distribution to the raw data for each region aids in the conversion of the data to standard normal space.Consequently, we assign a specific probability distribution type to each dataset, which lays the foundation for transforming the data into the standard normal space.Table 3 presents a summary of parameter statistics for the three demonstration areas.Notably, the raw rainfall data in the Sichuan-Chongqing, Beijing-Tianjin-Hebei-Shandong, and Yangtze River Delta demonstration areas all exhibit conformity to the single-sided exponential distribution.Probability density function curves and cumulative probability distribution curves are drafted in Figs. 5 and 6, respectively, indicating a close alignment between the original data from the three regions and their respective designated probability distribution types.According to the result of the chi2gof test, the test values are all 0, which shows that chi2gof does not reject the null hypothesis at the default 5% significance level and there is a good fit between the data and the theoretical distribution.
In the equation, we determined two parameters by fitting the data, so we have a = b = 4 , x(i) represents a specific data point.

Correlation analysis
For the purpose of assessing the correlation between the input variables and rainfall, this section conducted a correlation analysis on each input variable, as shown in Tables 4, 5 and 6.Taking the Sichuan-Chongqing region as an example, the Pearson correlation coefficients among the variables within the rainfall system are presented in Table 4.The table encompasses not only the dependent variable, rainfall P RCP , but also the independent vari- ables involved, including mean temperature T EMP , dew point temperature D EWP , sea level pressure S LP , station pressure S TP , mean wind speed W DSP , maximum sustained wind speed M XSPD , maximum air temperature M AX , and minimum air temperature M IN .From Table 4, it can be seen that the correlation coefficient between P RCP and D EWP is 0.305, which shows a positive correlation between the two variables.However, the correlation between D EWP and T EMP , M AX , and M IN is 0.950, 0.854, and 0.977, respectively, indicating that there is a strong correlation (4)  between D EWP and these three variables, making them unsuitable for the same model.Considering the input parameter criteria, we apply a similar approach to determine the other two input variables, S LP an W DSP , which exhibit correlation coefficients of -0.293 and 0.119 with P RCP .Although not significant, it still indicates a certain correlation between them.Therefore, the input variables of the rainfall prediction model in the Sichuan-Chongqing region are determined as dew point temperature D EWP , sea level pressure S LP , and mean wind speed W DSP .Similarly, the input variables of the remaining two regions can be determined as dew point temperature D EWP , sea level pressure S LP , and mean wind speed W DSP for the Beijing-Tianjin-Hebei-Shandong region, while mean temperature T EMP , level pressure S LP , maximum sustained wind speed M XSPD for the Yangtze River Delta region.

Multiple linear regression
This study utilizes the converted data from the three demonstration areas as respective samples.Rainfall P RCP is taken as the dependent variable, while dew point temperature D EWP or mean temperature T EMP , sea level pressure S LP , and mean wind speed W DSP or maximum sustained wind speed M XSPD are employed as the independent variables to carry out the multivariate linear regression statistical analysis and to establish the prediction models.
The results of the regression coefficients and significance tests are shown in Table 7.All variables in the three regions exhibit significance levels below 0.05, passing the significance test and demonstrating statistical significance.In the Sichuan-Chongqing region, the unstandardized regression coefficients for D EWP , S LP , and W DSP are 0.019 > 0, −0.013 < 0, and 0.087 > 0, respectively, indicating a positive correlation between rainfall and D EWP and W DSP , and a negative correlation with S LP .In the Beijing-Tianjin-Hebei-Shandong region, the unstandard- ized regression coefficients for D EWP , S LP , and W DSP are 0.020 > 0, −0.003 < 0, 0.029 > 0, respectively, indicating that greater D EWP and W DSP lead to increased rainfall, while higher S LP results in reduced rainfall.In the Yangtze www.nature.com/scientificreports/River Delta region, the unstandardized regression coefficients for T EMP , S LP , and M XSPD are −0.026< 0, −0.075 < 0, and 0.018 > 0, respectively, which indicates that rainfall decreases as the T EMP and S LP increase and it rises as the M XSPD increases.In summary, the multiple linear regression model for the Sichuan-Chongqing region is: The multiple linear regression model for the Beijing-Tianjin-Hebei-Shandong region is: The multiple linear regression model for the Yangtze River Delta region is:

Model performance testing
This paper employs various statistical methods, including the F-test, goodness-of-fit test, D-W test, covariance test and so on, to assess the established prediction model, as listed in Table 8, with the results displayed in Table 9.
The overall significance of the model is analyzed through the F-test, and the joint influence of all explanatory variables on the explained variables is tested to determine whether the multiple linear regression equation is established.As shown in Table 9, the F-test significance level is below 0.05 in all three regions, indicating that none of the independent variables in each region significantly affect rainfall, with a probability of 0.Moreover, there is a linear relationship between the dependent variable and the independent variable, resulting in the formulation of a multiple linear regression equation.
The determinable coefficient R 2 measures the goodness of fit between the data and the regression equation and serves as an important indicator of the relationship that exists between the explanatory variables and the explanatory variables.The value of the determinable coefficient is in the range of [0, 1], with values closer to 1 indicating a better fit.Table 9 reveals relatively small R 2 values for the multivariate linear regression equations in the three regions of Sichuan-Chongqing, Beijing-Tianjin-Hebei-Shandong and Yangtze River Delta, specifically 0.109, 0.101, and 0.087, respectively.These values indicate a general goodness of fit, signifying that rainfall possesses a certain linear correlation with D EWP /T EMP , S LP , and W DSP /M XSPD .
The correlation between the variables is analyzed by using the D-W test (Durbin-Watson test), with results in Table 9 indicating values between 1.5 and 2.5, close to 2, demonstrating independence and no autocorrelation among the selected independent variables in each region.The model is constructed favourably, proving the validity of the first hypothesis of the multiple linear regression model.
The covariance test is a method of determining the presence of multicollinearity by examining the extent to which a given explanatory variable is explained by all other explanatory variables in the equation.Each explanatory variable in the equation has a variance expansion factor VIF, which is used to test whether covariance exists between independent variables.As shown in Table 7, the tolerances of each variable in the three regions are all greater than 0.1, and the VIF values are all less than 5, which indicates that there is no multicollinearity and  SSR is the square sum from regression, SSE is the square sum from residuals, and k is the degree of freedom

Residual distribution state test
Probability-Probability plots are used to test the normality of data distribution, with simple linear regression requiring that regression residuals closely approximate a normal distribution.Figure 7 displays the Probability-Probability plots of the standardized residuals in the three demonstration districts, featuring a green diagonal line as the asymptote and a yellow curve representing the data distribution.The sample data primarily cluster around and closely align with the asymptote line, indicating that the residuals in the three districts conform to a normal distribution, affirming the validity of the third hypothesis of the multiple linear regression model.In order to further evaluate the effectiveness of the model parameter prediction equation, test the independence between the residual series and determine the fitting effect of the regression equation, residual analysis is usually used for diagnosis.The main element of residual analysis is to observe whether the overall trend of the distribution of residuals under different variables lies around y = 0. Taking the independent variables of the regression equations for the three regions as the horizontal coordinates and the residual values as the vertical coordinates, we plotted the corresponding residual distributions.As shown in Fig. 8, the fitted trend lines are basically close to or coincide with y = 0, and the residuals are basically distributed around the value of 0 throughout the entire stage, with no obvious upward or downward tendency, indicating that the regression model is well-fitted to the data.In addition, most of the residuals are uniformly concentrated in the [−2, 2] interval without linear changes, which further illustrates that there is no serial correlation between the residuals and good independence.

Prediction for three regions
The constructed rainfall prediction models based on the theory of multiple linear regression are individually applied to the three demonstration areas to predict their effective rainfall data in 2022 reasonably.According to  the National Standard of the People's Republic of China, "Standard for hydrological information and hydrological forecasting" (GB/T22482-2008) 44 , the accuracy of the medium and long-term rainfall forecasting adopts 20% of the multi-year measured variability as the evaluation standard.The forecasts smaller than the evaluation standard are designated as qualified forecasts, and the qualification rate is defined as the ratio of qualified forecast times to the total forecast times, as shown in Eq. ( 9).The results reveal that the Yangtze River Delta region boasts the highest forecast qualification rate at 77.8%, followed by the Sichuan-Chongqing region with a forecast qualification rate of 61.7%, while the Beijing-Tianjin-Hebei-Shandong region reports a forecast qualification rate of 58.6%, reaching the lowest.The qualification rate in the three areas is more than 50%, which is favorable and indicates that the prediction model has a certain degree of guiding significance in predicting rainfall for the three demonstration zones.
where QR is the qualified rate (take one decimal), %; m is the number of qualified forecasts; m is the total number of forecasts.

Conclusions
In this paper, the rainfall prediction models are constructed by using the rainfall meteorological data from a total of 94 meteorological stations in the Sichuan-Chongqing, Beijing-Tianjin-Heibei-Shandong and Yangtze River Delta regions from 1980 to 2021.Furthermore, the temporal distribution characteristics of rainfall towards transportation infrastructure in the past 42 years are analyzed.The main conclusions include: (1) The raw rainfall data in the Sichuan-Chongqing, Beijing-Tianjin-Hebei-Shandong, and Yangtze River Delta demonstration areas all exhibit conformity to the single-sided exponential distribution.There is a close alignment between the original data from the three regions and their respective designated probability distribution types, which lays the foundation for transforming the data into the standard normal space.(2) The input variables of the rainfall prediction model in the Sichuan-Chongqing region are determined as dew point temperature D EWP , sea level pressure S LP , and mean wind speed W DSP .Similarly, the input variables of the remaining two regions can be determined as dew point temperature D EWP , sea level pressure S LP , and mean wind speed W DSP for the Beijing-Tianjin-Hebei-Shandong region, while mean temperature T EMP , level pressure S LP , maximum sustained wind speed M XSPD for the Yangtze River Delta region.(3) The Yangtze River Delta region has a higher risk of rainstorm disaster compared to the other two regions according to the frequency of rainfall and the return period of precipitation concentration.Notably, during July and August, the incidence of rainstorms and heavy rainstorm disasters in the Beijing-Tianjin-Hebei-Shandong region surpasses that in the Sichuan-Chongqing region.Furthermore, annual rainfall and rainstorm disasters display fluctuating upward and downward trends in the Sichuan-Chongqing region, respectively, from 1980 to 2021.Conversely, the Beijing-Tianjin-Hebei-Shandong and Yangtze River Delta regions experience a consistent upward trend in annual rainfall and rainstorm disasters during the same period.(4) Based on the multiple linear regression method, we have developed rainfall prediction models for the three demonstration areas.These models have successfully passed an overall significance test, demonstrating the independence of their variables, absence of multicollinearity, and normal distribution of residuals, signifying a certain level of effectiveness and reliability.Additionally, when predicting rainfall for 2022, the qualified prediction rate reaches 61.7%, 58.6%, and 77.8%, respectively, indicating favorable prediction results with significant guidance.
As the collection and collation of historical disaster information involves different levels in different departments, such as meteorology and water conservancy, coupled with a few missing measurements at some monitoring stations, discrepancies in results are not uncommon.This article takes the Sichuan-Chongqing, Beijing-Tianjin-Hebei-Shandong and Yangtze River Delta regions as typical demonstration areas, and mainly constructs the corresponding rainfall prediction models using multiple linear regression based on the rainfall meteorological data collected and collated by the wheat software.It also analyzes the temporal characteristics of rainfall oriented the resilience assessment of transportation infrastructure from 1980 to 2021, which is conducive to further understanding the trends over time of rainfall in the three regions.To a certain extent, it provides a reference for predicting future rainfall trends and assessing the resilience of transportation infrastructure.Nonetheless, due to data limitations, the rainfall analysis remains less comprehensive, and the underlying factors contributing to these patterns are not thoroughly explored.Hence, more systematic and in-depth studies will need to be conducted in the future.

Figure 1 .
Figure 1.Schematic layout of the main skeleton of the national comprehensive three-dimensional transportation network (source of the base map: Outline of the National Comprehensive Three-Dimensional Transportation Network Plan-Annex).

Figure 2 .
Figure 2. Monthly distribution of the frequency of rainstorms/heavy rainstorms/extremely heavy rainstorms and above in three districts.

Figure 3 .
Figure 3. Temporal distribution of yearly rainfall in three regions.

Figure 4 .
Figure 4. Annual distribution of the frequency of rainstorms/heavy rainstorms/extremely heavy rainstorms and above in three districts.

Figure 5 .
Figure 5. Histograms of rainfall frequency and probability density function curves for the three regions.

Figure 6 .
Figure 6.Cumulative probability function curves for the three regions.

R 2 = 2 D 1 T
1 − SSE SST SSE is the square sum from residuals, SST is the total square sum D = (ui −ui−1) 2 ui is the Durbin-Watson statistic, u i is the value of the ith residual in the residual sequence VIF = VIF is the variance expansion factor, T is the value of tolerance

Figure 7 .
Figure 7. Probability-Probability plots of standardized residuals in three regions.

Figure 8 .
Figure 8. Distribution of residuals against various parameters residuals in three regions.

Table 1 .
Details of rainfall data.

Table 3 .
Statistical summary of parameter for the three regions.

Table 4 .
Results of correlation analysis of rainfall system parameters in the Sichuan-Chongqing region.

Table 5 .
Results of correlation analysis of rainfall system parameters in the Beijing-Tianjin-Hebei-Shandong region.

Table 6 .
Results of correlation analysis of rainfall system parameters in the Yangtze River Delta region.

Table 7 .
Regression coefficients and significance tests.

Table 8 .
Equations of various testing methods.

Table 9 .
Model summary and overall significance test.www.nature.com/scientificreports/extremely strong correlation link among the three independent variables selected in each region, confirming the validity of the second hypothesis of the multiple linear regression model.